Transform coefficient coding

ABSTRACT

Techniques are described for determining a scan order for transform coefficients of a block. The techniques may determine context for encoding or decoding significance syntax elements for the transform coefficients based on the determined scan order. A video encoder may encode the significance syntax elements and a video decoder may decode the significance syntax elements based on the determined contexts.

RELATED APPLICATIONS

This application claims the benefit of:

-   U.S. Provisional Application No. 61/625,039, filed Apr. 16, 2012,    and-   U.S. Provisional Application No. 61/667,382, filed Jul. 2, 2012, the    entire content each of which is incorporated by reference herein.

TECHNICAL FIELD

This disclosure relates to video coding and more particularly totechniques for coding syntax elements associated with transformcoefficients, used in video coding.

BACKGROUND

Digital video capabilities can be incorporated into a wide range ofdevices, including digital televisions, digital direct broadcastsystems, wireless broadcast systems, personal digital assistants (PDAs),laptop or desktop computers, tablet computers, e-book readers, digitalcameras, digital recording devices, digital media players, video gamingdevices, video game consoles, cellular or satellite radio telephones,so-called “smart phones,” video teleconferencing devices, videostreaming devices, and the like. Digital video devices implement videocompression techniques defined according to video coding standards.Digital video devices may transmit, receive, encode, decode, and/orstore digital video information more efficiently by implementing suchvideo compression techniques. Video coding standards include ITU-TH.261, ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual,ITU-T H.263, ISO/IEC MPEG-4 Visual and ITU-T H.264 (also known asISO/IEC MPEG-4 AVC), including its Scalable Video Coding (SVC) andMultiview Video Coding (MVC) extensions. In addition, High-EfficiencyVideo Coding (HEVC) is a video coding standard being developed by theJoint Collaboration Team on Video Coding (JCT-VC) of ITU-T Video CodingExperts Group (VCEG) and ISO/IEC Motion Picture Experts Group (MPEG).

Video compression techniques perform spatial (intra-picture) predictionand/or temporal (inter-picture) prediction to reduce or removeredundancy inherent in video sequences. For block-based video coding, avideo slice (i.e., a video frame or a portion of a video frame) may bepartitioned into video blocks, which may also be referred to astreeblocks, coding units (CUs) and/or coding nodes. Video blocks in anintra-coded (I) slice of a picture are encoded using spatial predictionwith respect to reference samples in neighboring blocks in the samepicture. Video blocks in an inter-coded (P or B) slice of a picture mayuse spatial prediction with respect to reference samples in neighboringblocks in the same picture or temporal prediction with respect toreference samples in other reference pictures. Pictures may be referredto as frames, and reference pictures may be referred to a referenceframes.

Spatial or temporal prediction results in a predictive block for a blockto be coded. Residual data represents pixel differences between theoriginal block to be coded and the predictive block. An inter-codedblock is encoded according to a motion vector that points to a block ofreference samples forming the predictive block, and the residual dataindicating the difference between the coded block and the predictiveblock. An intra-coded block is encoded according to an intra-coding modeand the residual data. For further compression, the residual data may betransformed from the pixel domain to a transform domain, resulting inresidual transform coefficients, which then may be quantized. Thequantized transform coefficients, initially arranged in atwo-dimensional array, may be scanned in order to produce aone-dimensional vector of transform coefficients, and entropy coding maybe applied to achieve even more compression.

SUMMARY

In general, this disclosure describes techniques for encoding anddecoding data representing syntax elements (e.g., significance flags)associated with transform coefficients of a block. In some techniques, avideo encoder and a video decoder each determines contexts to be usedfor context adaptive binary arithmetic coding (CABAC). As described inmore detail, the video encoder and the video decoder determine a scanorder for the block, and determine the contexts based on the scan order.In some examples, the video decoder determines contexts that are thesame for two or more scan orders, and different contexts for other scanorders. Similarly, in these examples, the video encoder determinescontexts that are the same for the two or more scan orders, anddifferent contexts for the other scan orders.

In one example, the disclosure describes a method for decoding videodata. The method comprising receiving, from a coded bitstream,significance flags of transform coefficients of a block, determining ascan order for the transform coefficients of the block, determiningcontexts for the significance flags of the transform coefficients of theblock based on the determined scan order, and context adaptive binaryarithmetic coding (CABAC) decoding the significance flags of thetransform coefficients based at least on the determined contexts.

In another example, the disclosure describes a method for encoding videodata. The method comprising determining a scan order for transformcoefficients of a block, determining contexts for significance flags ofthe transform coefficients of the block based on the determined scanorder, context adaptive binary arithmetic coding (CABAC) encoding thesignificance flags of the transform coefficients based at least on thedetermined contexts, and signaling the encoded significance flags in acoded bitstream.

In another example, the disclosure describes an apparatus for codingvideo data. The apparatus comprises a video coder configured todetermine a scan order for transform coefficients of a block, determinecontexts for significance flags of the transform coefficients of theblock based on the determined scan order, and context adaptive binaryarithmetic coding (CABAC) code the significance flags of the transformcoefficients based at least on the determined contexts.

In another example, the disclosure describes an apparatus for codingvideo data. The apparatus comprises means for determining a scan orderfor transform coefficients of a block, means for determining contextsfor significance flags of the transform coefficients of the block basedon the determined scan order, and means for context adaptive binaryarithmetic coding (CABAC) the significance flags of the transformcoefficients based at least on the determined contexts.

In another example, the disclosure describes a computer-readable storagemedium. The computer-readable storage medium having instructions storedthereon that when executed cause one or more processors of an apparatusfor coding video data to determine a scan order for transformcoefficients of a block, determine contexts for significance flags ofthe transform coefficients of the block based on the determined scanorder, and context adaptive binary arithmetic coding (CABAC) code thesignificance flags of the transform coefficients based at least on thedetermined contexts.

The details of one or more examples are set forth in the accompanyingdrawings and the description below. Other features, objects, andadvantages will be apparent from the description and drawings, and fromthe claims.

BRIEF DESCRIPTION OF DRAWINGS

FIGS. 1A-1C are conceptual diagrams illustrating examples of scan ordersof a block that includes transform coefficients.

FIG. 2 is a conceptual diagram illustrating a mapping of transformcoefficients to significance syntax elements.

FIG. 3 is a block diagram illustrating an example video encoding anddecoding system that may utilize techniques described in thisdisclosure.

FIG. 4 is a block diagram illustrating an example video encoder that mayimplement techniques described in this disclosure.

FIG. 5 is a block diagram illustrating an example of an entropy encoderthat may implement techniques for entropy encoding syntax elements inaccordance with this disclosure.

FIG. 6 is a flowchart illustrating an example process for encoding videodata according to this disclosure.

FIG. 7 is a block diagram illustrating an example video decoder that mayimplement techniques described in this disclosure.

FIG. 8 is a block diagram illustrating an example of an entropy decoderthat may implement techniques for decoding syntax elements in accordancewith this disclosure.

FIG. 9 is a flowchart illustrating an example process of decoding videodata according to this disclosure.

FIG. 10 is a conceptual diagram illustrating positions of a lastsignificant coefficient depending on the scan order.

FIG. 11 is a conceptual diagram illustrating use of a diagonal scan inplace of an original horizontal scan.

FIG. 12 is a conceptual diagram illustrating a context neighborhood fora nominal horizontal scan.

DETAILED DESCRIPTION

A video encoder determines transform coefficients for a block, encodessyntax elements, that indicate the values of the transform coefficients,using context adaptive binary arithmetic coding (CABAC), and signals theencoded syntax elements in a bitstream. A video decoder receives thebitstream that includes the encoded syntax elements that indicate thevalues of the transform coefficients and CABAC decodes the syntaxelements to determine the transform coefficients for the block.

The video encoder and video decoder determine which contexts are to beused to perform CABAC encoding and CABAC decoding, respectively. In thetechniques described in this disclosure, the video encoder and the videodecoder may determine which contexts to use to perform CABAC encoding orCABAC decoding based on a scan order of the block of the transformcoefficients. In some examples, the video encoder and the video decodermay determine which contexts to use to perform CABAC encoding or CABACdecoding based on a size of the block, positions of the transformcoefficients within the block, and the scan order.

In some examples, the video encoder and the video decoder may utilizedifferent contexts for different scan orders (i.e., a first set ofcontexts for horizontal scan, a second set of contexts for verticalscan, and a third set of contexts for diagonal scan). As anotherexample, if the block of transform coefficients is scanned vertically orhorizontally, the video encoder and the video decoder may utilize thesame contexts for both of these scan orders (e.g., for a particularposition of a transform coefficient).

By determining which contexts to use for CABAC encoding or CABACdecoding, the techniques described in this disclosure may exploit thestatistical behavior of the magnitudes of the transform coefficients ina way that achieves better video compression, as compared to othertechniques. For instance, it may be possible for the video encoder andthe video decoder to determine which contexts to use for CABAC encodingor CABAC decoding based on the position of the transform coefficient,irrespective of the scan order. However, the scan order may have aneffect on the ordering of the transform coefficients.

For example, the block of transform coefficients may be atwo-dimensional (2D) block of coefficients that the video encoder scansto construct a one-dimensional (1D) vector, and the video encoderentropy encodes (using CABAC) the values of the transform coefficientsin the 1D vector. The order in which the video encoder places the values(e.g., magnitudes) of the transform coefficients in the 1D vector is afunction of the scan order. The order in which the video encoder placesthe magnitudes of the transform coefficients for a diagonal scan may bedifferent than the order in which the video encoder places themagnitudes of the transform coefficients for a vertical scan.

In other words, the position of the magnitudes of the transformcoefficients may be different for different scan orders. The position ofthe magnitudes of the transform coefficients may have an effect oncoding efficiency. For instance, the location of the last significantcoefficient, in the block, may be different for different scan orders.In this case, the magnitude of the last significant coefficient may bedifferent for different scan orders.

Accordingly, these other techniques that determine contexts based on theposition of the transform coefficient irrespective to the scan orderfail to properly account for the potential that the significancestatistics for a transform coefficient in a particular position may varydepending on the scan order. In the techniques described in thisdisclosure, the video encoder and video decoder may determine the scanorder for the block, and determine contexts based on the determined scanorder (and in some examples, also based on the positions of thetransform coefficients and possibly the size of the block). This way,the video encoder and video decoder may better account for thesignificance statistics for determining which contexts to use ascompared to techniques that do not rely on the scan order and rely onlyon the position for determining which contexts to use.

In some examples of video coding, the video encoder and the videodecoder may use five coding passes to encode or decode transformcoefficients of a block, namely, (1) a significance pass, (2) a greaterthan one pass, (3) a greater than two pass, (4) a sign pass, and (5) acoefficient level remaining pass. The techniques of this disclosure,however, are not necessarily limited to five pass scenarios. In general,significance coding refers to generating syntax elements to indicatewhether any of the coefficients within the block have an absolute valueof one or greater. That is, a coefficient with an absolute value of oneor greater is considered “significant.” The other coding passes aredescribed in more detail below.

During the significance pass, the video encoder determines syntaxelements that indicate whether a transform coefficient is significant.Syntax elements that indicate whether a transform coefficient issignificant are referred to herein as significance syntax elements. Oneexample of a significance syntax element is a significance flag, where avalue of 0 for the significance flag indicates that the coefficient isnot significant (i.e., the value of the transform coefficient is 0) anda value of 1 for the significance flag indicates that the coefficient issignificant (i.e., the value of the transform coefficient is non-zero).

To perform the significance pass, the video encoder scans the transformcoefficients of a block or part of the block (if the position of thelast significant position is previously determined and signaled to thedecoder), and determines the significance syntax element for eachtransform coefficient. There are various examples of the scan order,such as a horizontal scan, a vertical scan, and a diagonal scan. Thevideo encoder CABAC encodes the significance syntax elements and signalsthe encoded significance syntax elements in a coded bitstream. Othertypes of scans, such as zig-zag scans, adaptive or partially adaptivescans may also be used in some examples.

To apply CABAC coding to a syntax element, binarization may be appliedto a syntax element to form a series of one or more bits, which arereferred to as “bins.” In addition, a coding context may be associatedwith a bin of the syntax element. The coding context may identifyprobabilities of coding bins having particular values. For instance, acoding context may indicate a 0.7 probability of coding a O-valued bin(representing an example of a “most probable symbol,” in this instance)and a 0.3 probability of coding a 1-valued bin. After identifying thecoding context, a bin may be arithmetically coded based on the context.In some cases, contexts associated with a particular syntax element orbins thereof may be dependent on other syntax elements or codingparameters.

In the techniques described in this disclosure, the video encoder maydetermine which contexts to use for the CABAC encoding based on the scanorder. The video encoder may use one set of contexts per scan ordertype. For example, if the block is a 4×4 block, there are sixteencoefficients. In this example, the video encoder may utilize sixteencontexts for each scan resulting in a total of forty-eight contexts(i.e., sixteen contexts for horizontal scan, sixteen contexts forvertical scan, and sixteen contexts for diagonal scan for a total offorty-eight contexts). The same would hold for an 8×8 block, but with atotal of 192 contexts (i.e., sixty-four contexts for horizontal scan,sixty-four contexts for vertical scan, and sixty-four contexts fordiagonal scan for a total of 192 contexts). However, the example offorty-eight or 192 contexts is provided for purposes of illustrationonly. It may be possible that the number of contexts for each block is afunction of block size.

The video decoder receives the coded bitstream (e.g., from the videoencoder directly or via a storage medium that stores the codedbitstream) and performs a reciprocal function, as that of the videoencoder, to determine the values of the transform coefficients. Forexample, the video decoder implements the significance pass to determinewhich transform coefficients are significant based on the significancesyntax elements in the received bitstream.

In the techniques described in this disclosure, the video decoder maydetermine the scan order of the transform coefficients of the block(e.g., the scan order in which the transform coefficients were scanned).The video decoder may determine which contexts to use for CABAC decodingthe significance syntax elements based on the scan order (e.g., sixteenof the forty-eight contexts for a 4×4 block or sixty-four of the 192contexts for an 8×8 block). In this manner, the video decoder may selectthe same contexts for CABAC decoding that video encoder selected forCABAC encoding. The video decoder CABAC decodes the significance syntaxelements based on the determined contexts.

In the above examples, the video encoder and the video decoderdetermined contexts based on the scan order, where the contexts weredifferent for different scan orders resulting in a total of forty-eightcontexts for a 4×4 block and 192 contexts for an 8×8 block. However, thetechniques described in this disclosure are not limited in this respect.Alternatively, in some examples, the contexts that the video encoder andthe video decoder use may be the same contexts for multiple (i.e., twoor more) scan orders to allow for context sharing depending on scanorder type.

As one example, the video encoder and the video decoder may determinecontexts that are the same if the scan order is a horizontal scan or ifthe scan order is a vertical scan. In other words, the contexts are thesame if the scan order is the horizontal scan or if the scan order isthe vertical scan for a particular position of the transform coefficientwithin the block. The video encoder and the video decoder may utilizedifferent contexts for the diagonal scan. In this example, the number ofcontexts for the 4×4 block reduces from forty-eight contexts tothirty-two contexts and for the 8×8 block reduces from 192 contexts to128 because the contexts for the horizontal scan and the vertical scanare the same, and there are different contexts for the diagonal scan.

As another example, it may be possible for the video encoder and thevideo decoder to use the same contexts for all scan order types, whichreduces the contexts to sixteen for the 4×4 block and sixty-four for the8×8 block. However, using the same contexts for all scan order types maybe a function of the block size. For example, for certain block sizes,it may be possible to use the same contexts for all scan orders, and forcertain other blocks sizes, the contexts may be different for thedifferent scan orders, or two or more of the scan orders may sharecontexts.

For instance, for an 8×8 block, the contexts for the horizontal andvertical scans may be the same (e.g., for a particular position), anddifferent for the diagonal scan. For the 4×4, 16×16, and 32×32 blocks,the contexts may be different for different scan orders. Moreover, insome other techniques that relied on position, the contexts for the 2Dblock and the 1D block may be different. In the techniques described inthis disclosure, when contexts are shared for all scan orders, thecontexts for the 2D block or the 1D block may be the same.

In some examples, in addition to utilizing the scan order to determinethe contexts, the video encoder and the video decoder may account forthe size of the block. For instance, in the above example, the size ofthe block indicated whether all scan orders share contexts. In someexamples, the video encoder and the video decoder may determine whichcontexts to use based on the size of the block and the scan order. Inthese examples, the techniques described in this disclosure may allowfor context sharing. For instance, for a block with a first size, thevideo encoder and the video decoder may determine contexts that are thesame if the block of the first size is scanned horizontally or if theblock of the first size is scanned vertically. For a block with a secondsize, the video encoder and the video decoder may determine contextsthat are the same if the block of the second size is scannedhorizontally or if the block of the second size is scanned vertically.

There may be other variations to these techniques. For example, forcertain sized blocks (e.g., 16×16 or 32×32), the video encoder and thevideo decoder determine a first set of contexts that are used for CABACencoding or CABAC decoding for all scan orders. For certain sized blocks(e.g., 8×8), the video encoder and the video decoder determines a secondset of contexts that are used for CABAC encoding or CABAC decoding for adiagonal scan, and a third set of contexts that are used for CABACencoding or CABAC decoding for both a horizontal scan and a verticalscan. For certain sized blocks (e.g., 4×4), the video encoder and thevideo decoder determine a fourth set of contexts that are used for CABACencoding or CABAC decoding for a diagonal scan, a horizontal scan and avertical scan.

In some cases, the examples of determining contexts based on the scanorder may be directed to intra-coding modes. For example, the transformcoefficients may be the result from intra-coding, and the techniquesdescribed in this disclosure may be applicable to such transformcoefficients. However, the techniques described in this disclosure arenot so limited and may be applicable for inter-coding or intra-coding.

FIGS. 1A-1C are conceptual diagrams illustrating examples of scan ordersof a block that includes transform coefficients. A block that includestransform coefficients may be referred to as a transform block (TB). Atransform block may be a block of a transform unit. For example, atransform unit includes three transform blocks and the correspondingsyntax elements. A transform unit may be a transform block of lumasamples of size 8×8, 16×16, or 32×32 or four transform blocks of lumasamples of size 4×4, two corresponding transform blocks of chromasamples of a picture that three sample arrays, or a transform block ofluma samples of size 8×8, 16×16, or 32×32, or four transform blocks ofluma samples of size 4×4 or a monochrome picture or a picture that iscoded using separate color planes and syntax structures used totransform the transform block samples.

FIG. 1A illustrates a horizontal scan of 4×4 block 10 (e.g., TB 10) thatincludes transform coefficients 12A to 12P (collectively referred to as“transform coefficients 12”). For example, the horizontal scan startsfrom transform coefficient 12P and ends at transform coefficient 12A,and proceeds horizontally through the transform coefficients.

FIG. 1B illustrates a vertical scan of 4×4 block 14 (e.g., TB 14) thatincludes transform coefficients 16A to 16P (collectively referred to as“transform coefficients 16”). For example, the vertical scan starts fromtransform coefficient 16P and ends at transform coefficient 16A, andproceeds vertically through the transform coefficients.

FIG. 1C illustrates a diagonal scan of 4×4 block 18 (e.g., TB 18) thatincludes transform coefficients 20A to 20P (collectively referred to as“transform coefficients 20”). For example, the diagonal scan starts fromtransform coefficient 20P and ends at transform coefficient 20A, andproceeds diagonally through the transform coefficients.

It should be understood that although FIGS. 1A-1C illustrate startingfrom the last transform coefficient and ending on the first transformcoefficient, the techniques of this disclosure are not so limited. Insome examples, the video encoder may determine the location of the lastsignificant coefficient (e.g., the last transform coefficient with anon-zero value) in the block. The video encoder may scan starting fromthe last significant coefficient and ending on the first transformcoefficient. The video encoder may signal the location of the lastsignificant coefficient in the coded bitstream (i.e., x and y coordinateof the last significant coefficient), and the video decoder may receivethe location of the last significant coefficient from the codedbitstream. In this manner, the video decoder may determine thatsubsequent syntax elements for the transform coefficients (e.g., thesignificance syntax elements) are for transform coefficients startingfrom the last significant coefficient and ending on the first transformcoefficient.

Although FIGS. 1A-1C are illustrated as 4×4 blocks, the techniquesdescribed in this disclosure are not so limited, and the techniques canbe extended to other sized blocks. Moreover, in some cases, one or moreof 4×4 blocks 10, 14, and 18 may be sub-blocks of a larger block. Forexample, an 8×8 block can be divided into four 4×4 sub-blocks, a 16×16can be divided into sixteen 4×4 sub-blocks, and so forth, and one ormore of 4×4 blocks 10, 14, and 18 may be sub-blocks of the 8×8 block or16×16 block. Examples of sub-block horizontal and vertical scans aredescribed in: (1) Rosewarne, C., Maeda, M. “Non-CE11: Harmonisation of8×8 TU residual scan” JCT-VC Contribution JCTVC-H0145; (2) Yu, Y.,Panusopone, K., Lou, J., Wang, L. “Adaptive Scan for Large Blocks forHEVC; JCT-VC Contribution JCTVC-F569; and (3) U.S. patent applicationSer. No. 13/551,458, filed Jul. 17, 2012, each of which is herebyincorporated by reference.

Transform coefficients 12, 16, and 20 represent transformed residualvalues between a block that is being predicted and another block. Thevideo encoder generates significance syntax elements that indicatewhether the values of transform coefficients 12, 16, and 20 are zero ornon-zero, encodes the significance syntax elements, and signals theencoded significance syntax elements in a coded bitstream. The videodecoder receives the coded bitstream and decodes the significance syntaxelements as part of the process of determining transform coefficients12, 16, and 20.

For encoding and decoding, the video encoder and the video decoderdetermine contexts that are to be used for context adaptive binaryarithmetic coding (CABAC) encoding and decoding. In the techniquesdescribed in this disclosure, to determine the contexts for thesignificance syntax elements for transform coefficients 12, 16, and 20,the video encoder and the video decoder account for the scan order.

For example, if the video encoder and the video decoder determine thatthe scan order is a horizontal scan, then the video encoder and thevideo decoder may determine a first set of contexts for the sixteentransform coefficients 12 of TU 10. If the video encoder and the videodecoder determine that the scan order in a vertical scan, then the videoencoder and the video decoder may determine a second set of contexts forthe sixteen transform coefficients 16 of TU 14. If the video encoder andthe video decoder determine that the scan order is a diagonal scan, thenthe video encoder and the video decoder may determine a third set ofcontexts for the sixteen transform coefficients 20 of TU 18.

In this example, assuming no context sharing, there are a total offorty-eight contexts for the 4×4 blocks 10, 14, and 18 (i.e., sixteencontexts for each of the three scan orders). If blocks 10, 14, and 18were 8×8 sized blocks, assuming no context sharing, then there wouldsixty-four contexts for each of the three 8×8 sized blocks, for a totalof 192 contexts (i.e., sixty-four contexts for each of the three scanorders).

As described in more detail, in some examples, it may be possible fortwo or more scan orders to share contexts. For example, two or more ofthe first set of contexts, second set of contexts, and the third set ofcontexts may be the same set of contexts. For instance, the first set ofcontexts for the horizontal scan may be the same as the second set ofcontexts for the vertical scan. In some cases, the first, second, andthird contexts may be the same set of contexts.

In the above examples, the video encoder and the video decoder determinefrom a first, second, and third set of contexts the contexts to use forCABAC encoding and decoding based on the scan order. In some examples,the video encoder and the video decoder determine which contexts to usefor CABAC encoding and decoding based on the scan order and a size ofthe block.

For example, if the block is 8×8, then the video encoder and the videodecoder determine contexts from a fourth, fifth, and sixth set ofcontexts (one for each scan order) based on the scan order. If the blockis 16×16, then the video encoder and the video decoder determinecontexts from a seventh, eighth, and ninth set of contexts (one for eachscan order) based on the scan order, and so forth. Similar to above, insome examples, there may be context sharing for the different sizedblocks.

There may be variants of the above example techniques. For example, inone case, for a particular sized block (e.g., 4×4), the video encoderand video decoder determine contexts that are the same for all scanorders, but for an 8×8 sized block, the video encoder and the videodetermine the contexts that are the same for a horizontal scan and avertical scan (e.g., for transform coefficients in particularpositions), and different contexts for the diagonal scan. As anotherexample, for larger sized blocks (e.g., 16×16 and 32×32), the videoencoder and the video decoder may determine contexts that are the samefor all scan orders and for both sizes. In some examples, for the 16×16and 32×32 blocks, horizontal and vertical scans may not be applied.Other such permutations and combinations are possible, and arecontemplated by this disclosure.

Determining which contexts to use for CABAC encoding and decoding basedon the scan order may better account for the magnitudes of the transformcoefficients. For example, the scan order defines the arrangement of thetransform coefficients. As one example, the magnitude of the firsttransform coefficient (referred to as the DC coefficient) is generallythe highest. The magnitude of the second transform coefficient is thenext highest (on average, but not necessarily), and so forth. However,the location of the second transform coefficient is based on the scanorder. For example, in FIG. 1A, the second transform coefficient is thetransform coefficient immediately to the right of the first transformcoefficient (i.e., immediately right of transform coefficient 12A).However, in FIGS. 1B and 1C, the second transform coefficient is thetransform coefficient immediately below the first transform coefficient(i.e., immediately below transform coefficient 16A in FIG. 1B andimmediately below transform coefficient 20A in FIG. 1C).

In this way, the significance statistics for a transform coefficient ina particular scan position may vary depending on the scan order. Forexample, in FIG. 1A, for the horizontal scan, the last transformcoefficient in the first row may have much higher magnitude (on average)compared to the same transform coefficient in the vertical scan of FIG.1B or the diagonal scan of FIG. 1C.

By determining which contexts to use based on the scan order, the videoencoder and the video decoder may be configured to better CABAC encodeor CABAC decode as compared to other techniques that do not account forthe scan order. For example, it may be possible that the encoding anddecoding of the significance syntax elements (e.g., significance flags)for 4×4 and 8×8 blocks is position based. For instance, there is aseparate context for each position in a 4×4 block and a separate contextfor each 2×2 sub-block of an 8×8 block.

However, in this case, the context is based on the location of thetransform coefficient, irrespective of the actual scan order (i.e.,position based contexts for 4×4 and 8×8 blocks do not distinguishbetween the various scans). For example, the context for a transformcoefficient located at (i, j) in the block is the same for thehorizontal, vertical, and diagonal scans. As described above, the scanorder may have an effect on the significance statistics for thetransform coefficients, and the techniques described in this disclosuremay determine contexts based on the scan order to account for thesignificance statistics.

As described above, in some examples, the video encoder and the videodecoder may determine contexts that are the same for two or more scanorders. There may be various ways in which the video encoder and thevideo decoder may determine contexts that are the same for two or morescan orders for particular locations of transform coefficients. As oneexample, the horizontal and the vertical scan orders share the contextsfor a particular block size by sharing contexts between the horizontalscan and a transpose of the block of the vertical scan. For instance,the video encoder and the video decoder may determine the same contextfor a transform coefficient (i, j) for the horizontal scan and atransform coefficient (j, i) for a vertical scan for a particular blocksize.

This instance is one example of where transform coefficients at aparticular position share contexts for different scan orders. Forexample, the context for the transform coefficient at position (i, j)for a horizontal scan and the context for the transform coefficient atposition (j, i) for a vertical scan may be the same context. In someexamples, the sharing of the contexts may be applicable for 8×8 sizedblocks of transform coefficients. Also, in some examples, if the scanorder is not horizontal or vertical (e.g., diagonal), the context forposition (i, j) and/or (j, i) may be different than for the sharedcontext for horizontal and vertical scan.

However, the techniques described in this disclosure are not so limited,and should not be considered limited to examples where the contexts fora transform coefficient (i, j) for the horizontal scan and a transformcoefficient (j, i) for a vertical scan for a particular block size arethe same. The following is another example manner in which the contextsfor transform coefficients at particular positions are shared fordifferent scan orders.

For instance, the contexts for the fourth (last) row of the block, forthe horizontal scan, may be same as the contexts for the fourth (last)column of the block, for the vertical scan, the contexts for the thirdrow of the block, for the horizontal scan, may be the same the contextsfor the third column of the block, for the vertical scan, the contextsfor the second row of the block, for the horizontal scan, may be thesame the contexts for the second column of the block, for the verticalscan, and the contexts for the first row of the block, for thehorizontal scan, may be the same the contexts for the first column ofthe block, for the vertical scan. The same may be applied to 8×8 blocks.There may be other example ways for the video encoder and the videodecoder to determine contexts that are the same for two or more of thescan orders.

In some examples, it may be possible for contexts to be shared betweendifferent block sizes (e.g., shared between a 4×4 block and an 8×8block). As an example, the context for transform coefficient (1, 1) in a4×4 block and the context for transform coefficients (2, 2), (2, 3), (3,2), and (3, 3) in an 8×8 block may be the same, and in some examples,may be the same for a particular scan order.

FIG. 2 is a conceptual diagram illustrating a mapping of transformcoefficients to significance syntax elements. For example, the left sideof FIG. 2 illustrates transform coefficients values and the right sideof FIG. 2 illustrates corresponding significance syntax elements. Forall transform coefficients whose values are non-zero, there is acorresponding significance syntax element (e.g., significance flag) witha value of 1. For all transform coefficients whose values are 0, thereis a corresponding significance syntax element (e.g., significance flag)with a value of 0. In the examples described in this disclosure, thevideo encoder and the video decoder are configured to CABAC encode andCABAC decode the example significance syntax elements illustrated inFIG. 2 by determining contexts based on the scan order, and in someexamples, also based on positions of the transform coefficients and thesize of the block.

FIG. 3 is a block diagram illustrating an example video encoding anddecoding system 22 that may be configured to assign contexts utilizingthe techniques described in this disclosure. As shown in FIG. 3, system22 includes a source device 24 that generates encoded video data to bedecoded at a later time by a destination device 26. Source device 24 anddestination device 26 may comprise any of a wide range of devices,including desktop computers, notebook (i.e., laptop) computers, tabletcomputers, set-top boxes, telephone handsets such as so-called “smart”phones, so-called “smart” pads, televisions, cameras, display devices,digital media players, video gaming consoles, video streaming device, orthe like. In some cases, source device 24 and destination device 26 maybe equipped for wireless communication.

Destination device 26 may receive the encoded video data to be decodedvia a link 28. Link 28 may comprise any type of medium or device capableof moving the encoded video data from source device 24 to destinationdevice 26. In one example, link 28 may comprise a communication mediumto enable source device 24 to transmit encoded video data directly todestination device 26 in real-time. The encoded video data may bemodulated according to a communication standard, such as a wirelesscommunication protocol, and transmitted to destination device 26. Thecommunication medium may comprise any wireless or wired communicationmedium, such as a radio frequency (RF) spectrum or one or more physicaltransmission lines. The communication medium may form part of apacket-based network, such as a local area network, a wide-area network,or a global network such as the Internet. The communication medium mayinclude routers, switches, base stations, or any other equipment thatmay be useful to facilitate communication from source device 24 todestination device 26.

Alternatively, encoded data may be output from output interface 34 to astorage device 38. Similarly, encoded data may be accessed from storagedevice 38 by input interface 40. Storage device 38 may include any of avariety of distributed or locally accessed data storage media such as ahard drive, Blu-ray discs, DVDs, CD-ROMs, flash memory, volatile ornon-volatile memory, or any other suitable digital storage media forstoring encoded video data. In a further example, storage device 38 maycorrespond to a file server or another intermediate storage device thatmay hold the encoded video generated by source device 24. Destinationdevice 26 may access stored video data from storage device 38 viastreaming or download. The file server may be any type of server capableof storing encoded video data and transmitting that encoded video datato the destination device 26. Example file servers include a web server(e.g., for a website), an FTP server, network attached storage (NAS)devices, or a local disk drive. Destination device 26 may access theencoded video data through any standard data connection, including anInternet connection. This may include a wireless channel (e.g., a Wi-Ficonnection), a wired connection (e.g., DSL, cable modem, etc.), or acombination of both that is suitable for accessing encoded video datastored on a file server. The transmission of encoded video data fromstorage device 38 may be a streaming transmission, a downloadtransmission, or a combination of both.

The techniques of this disclosure are not necessarily limited towireless applications or settings. The techniques may be applied tovideo coding in support of any of a variety of multimedia applications,such as over-the-air television broadcasts, cable televisiontransmissions, satellite television transmissions, streaming videotransmissions, e.g., via the Internet, encoding of digital video forstorage on a data storage medium, decoding of digital video stored on adata storage medium, or other applications. In some examples, system 22may be configured to support one-way or two-way video transmission tosupport applications such as video streaming, video playback, videobroadcasting, and/or video telephony.

In the example of FIG. 3, source device 24 includes a video source 30,video encoder 32 and an output interface 34. In some cases, outputinterface 34 may include a modulator/demodulator (modem) and/or atransmitter. In source device 24, video source 30 may include a sourcesuch as a video capture device, e.g., a video camera, a video archivecontaining previously captured video, a video feed interface to receivevideo from a video content provider, and/or a computer graphics systemfor generating computer graphics data as the source video, or acombination of such sources. As one example, if video source 30 is avideo camera, source device 24 and destination device 26 may formso-called camera phones or video phones. However, the techniquesdescribed in this disclosure may be applicable to video coding ingeneral, and may be applied to wireless and/or wired applications.

The captured, pre-captured, or computer-generated video may be encodedby video encoder 32. The encoded video data may be transmitted directlyto destination device 26 via output interface 34 of source device 24.The encoded video data may also (or alternatively) be stored ontostorage device 38 for later access by destination device 26 or otherdevices, for decoding and/or playback.

Destination device 26 includes an input interface 40, a video decoder42, and a display device 44. In some cases, input interface 40 mayinclude a receiver and/or a modem. Input interface 40 of destinationdevice 26 receives the encoded video data over link 28. The encodedvideo data communicated over link 28, or provided on storage device 38,may include a variety of syntax elements generated by video encoder 32for use by a video decoder, such as video decoder 42, in decoding thevideo data. Such syntax elements may be included with the encoded videodata transmitted on a communication medium, stored on a storage medium,or stored a file server.

Display device 44 may be integrated with, or external to, destinationdevice 26. In some examples, destination device 26 may include anintegrated display device and also be configured to interface with anexternal display device. In other examples, destination device 26 may bea display device. In general, display device 44 displays the decodedvideo data to a user, and may comprise any of a variety of displaydevices such as a liquid crystal display (LCD), a plasma display, anorganic light emitting diode (OLED) display, or another type of displaydevice.

Video encoder 32 and video decoder 42 may operate according to a videocompression standard, such as the ITU-T H.264 standard, alternativelyreferred to as MPEG-4, Part 10, Advanced Video Coding (AVC), orextensions of such standards. Alternatively, video encoder 32 and videodecoder 42 may operate according to other proprietary or industrystandards, such as the High Efficiency Video Coding (HEVC) standard, andmay conform to the HEVC Test Model (HM). The techniques of thisdisclosure, however, are not limited to any particular coding standard.Other examples of video compression standards include MPEG-2 and ITU-TH.263.

Although not shown in FIG. 3, in some aspects, video encoder 32 andvideo decoder 42 may each be integrated with an audio encoder anddecoder, and may include appropriate MUX-DEMUX units, or other hardwareand software, to handle encoding of both audio and video in a commondata stream or separate data streams. If applicable, in some examples,MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol, orother protocols such as the user datagram protocol (UDP).

Video encoder 32 and video decoder 42 each may be implemented as any ofa variety of suitable encoder circuitry, such as one or moremicroprocessors, digital signal processors (DSPs), application specificintegrated circuits (ASICs), field programmable gate arrays (FPGAs),discrete logic, software, hardware, firmware or any combinationsthereof. When the techniques are implemented partially in software, adevice may store instructions for the software in a suitable,computer-readable storage medium and execute the instructions inhardware using one or more processors to perform the techniques of thisdisclosure. Each of video encoder 32 and video decoder 42 may beincluded in one or more encoders or decoders, either of which may beintegrated as part of a combined encoder/decoder (CODEC) in a respectivedevice. For example, the device that includes video decoder 42 may bemicroprocessor, an integrated circuit (IC), or a wireless communicationdevice that includes video decoder 42.

The JCT-VC is working on development of the HEVC standard. The HEVCstandardization efforts are based on an evolving model of a video codingdevice referred to as the HEVC Test Model (HM). The HM presumes severaladditional capabilities of video coding devices relative to existingdevices according to, e.g., ITU-T H.264/AVC. For example, whereas H.264provides nine intra-prediction encoding modes, the HM may provide asmany as thirty-five intra-prediction encoding modes.

In general, the working model of the HM describes that a video frame orpicture may be divided into a sequence of treeblocks or largest codingunits (LCU) that include both luma and chroma samples. A treeblock has asimilar purpose as a macroblock of the H.264 standard. A slice includesa number of consecutive treeblocks in coding order. A video frame orpicture may be partitioned into one or more slices. Each treeblock maybe split into coding units (CUs) according to a quadtree. For example, atreeblock, as a root node of the quadtree, may be split into four childnodes, and each child node may in turn be a parent node and be splitinto another four child nodes. A final, unsplit child node, as a leafnode of the quadtree, comprises a coding node, i.e., a coded videoblock. Syntax data associated with a coded bitstream may define amaximum number of times a treeblock may be split, and may also define aminimum size of the coding nodes.

A CU includes a coding node and prediction units (PUs) and transformunits (TUs) associated with the coding node. As described above, atransform unit includes one or more transform blocks, and the techniquesdescribed in this disclosure are related to determining contexts for thesignificance syntax elements for the transform coefficients of atransform block based on a scan order and, in some examples, based on ascan order and size of the transform block. A size of the CU correspondsto a size of the coding node and must be square in shape. The size ofthe CU may range from 8×8 pixels up to the size of the treeblock with amaximum of 64×64 pixels or greater. Each CU may contain one or more PUsand one or more TUs. Syntax data associated with a CU may describe, forexample, partitioning of the CU into one or more PUs. Partitioning modesmay differ between whether the CU is skip or direct mode encoded,intra-prediction mode encoded, or inter-prediction mode encoded. PUs maybe partitioned to be non-square in shape. Syntax data associated with aCU may also describe, for example, partitioning of the CU into one ormore TUs according to a quadtree.

A TU can be square or non-square in shape. Again, a TU includes one ormore transform blocks (TBs) (e.g., one TB for the luma samples, one TBfor the first chroma samples, and one TB for the second chroma samples).In this sense, a TU can be considered conceptually as including theseTBs, and these TBs can be square or non-square in shape. For example, inthis disclosure, the term TU is used to generically refer to the TBs,and the example techniques described in this disclosure are describedwith respect to a TB.

The HEVC standard allows for transformations according to TUs, which maybe different for different CUs. The TUs are typically sized based on thesize of PUs within a given CU defined for a partitioned LCU, althoughthis may not always be the case. The TUs are typically the same size orsmaller than the PUs. In some examples, residual samples correspondingto a CU may be subdivided into smaller units using a quadtree structureknown as “residual quad tree” (RQT). The leaf nodes of the RQT may bereferred to as transform units (TUs). Pixel difference values associatedwith the TUs may be transformed to produce transform coefficients, whichmay be quantized.

In general, a PU includes data related to the prediction process. Forexample, when the PU is intra-mode encoded (intra-prediction encoded),the PU may include data describing an intra-prediction mode for the PU.As another example, when the PU is inter-mode encoded (inter-predictionencoded), the PU may include data defining a motion vector for the PU.The data defining the motion vector for a PU may describe, for example,a horizontal component of the motion vector, a vertical component of themotion vector, a resolution for the motion vector (e.g., one-quarterpixel precision or one-eighth pixel precision), a reference picture towhich the motion vector points, and/or a reference picture list (e.g.,List 0 (L0) or List 1 (L1)) for the motion vector.

In general, a TU is used for the transform and quantization processes. Agiven CU having one or more PUs may also include one or more transformunits (TUs). The TUs include one or more transform blocks (TBs). Blocks10, 14, and 18 of FIGS. 1A-1C, respectively, are examples of TBs.Following prediction, video encoder 32 may calculate residual valuescorresponding to the PU. The residual values comprise pixel differencevalues that may be transformed into transform coefficients, quantized,and scanned using the TBs to produce serialized transform coefficientsfor entropy coding. This disclosure typically uses the term “videoblock” to refer to a coding node of a CU. In some specific cases, thisdisclosure may also use the term “video block” to refer to a treeblock,i.e., LCU, or a CU, which includes a coding node and PUs. The term“video block” may also refer to transform blocks of a TU.

For example, for video coding according to the high efficiency videocoding (HEVC) standard currently under development, a video picture maybe partitioned into coding units (CUs), prediction units (PUs), andtransform units (TUs). A CU generally refers to an image region thatserves as a basic unit to which various coding tools are applied forvideo compression. A CU typically has a square geometry, and may beconsidered to be similar to a so-called “macroblock” under other videocoding standards, such as, for example, ITU-T H.264.

To achieve better coding efficiency, a CU may have a variable sizedepending on the video data it contains. That is, a CU may bepartitioned, or “split” into smaller blocks, or sub-CUs, each of whichmay also be referred to as a CU. In addition, each CU that is not splitinto sub-CUs may be further partitioned into one or more PUs and TUs forpurposes of prediction and transform of the CU, respectively.

PUs may be considered to be similar to so-called partitions of a blockunder other video coding standards, such as H.264. PUs are the basis onwhich prediction for the block is performed to produce “residual”coefficients. Residual coefficients of a CU represent a differencebetween video data of the CU and predicted data for the CU determinedusing one or more PUs of the CU. Specifically, the one or more PUsspecify how the CU is partitioned for the purpose of prediction, andwhich prediction mode is used to predict the video data contained withineach partition of the CU.

One or more TUs of a CU specify partitions of a block of residualcoefficients of the CU on the basis of which a transform is applied tothe block to produce a block of residual transform coefficients for theCU. The one or more TUs may also be associated with the type oftransform that is applied. The transform converts the residualcoefficients from a pixel, or spatial domain to a transform domain, suchas a frequency domain. In addition, the one or more TUs may specifyparameters on the basis of which quantization is applied to theresulting block of residual transform coefficients to produce a block ofquantized residual transform coefficients. The residual transformcoefficients may be quantized to possibly reduce the amount of data usedto represent the coefficients.

A CU generally includes one luminance component, denoted as Y, and twochrominance components, denoted as U and V. In other words, a given CUthat is not further split into sub-CUs may include Y, U, and Vcomponents, each of which may be further partitioned into one or morePUs and TUs for purposes of prediction and transform of the CU, aspreviously described. For example, depending on the video samplingformat, the size of the U and V components, in terms of a number ofsamples, may be the same as or different than the size of the Ycomponent. As such, the techniques described above with reference toprediction, transform, and quantization may be performed for each of theY, U, and V components of a given CU.

To encode a CU, one or more predictors for the CU are first derivedbased on one or more PUs of the CU. A predictor is a reference blockthat contains predicted data for the CU, and is derived on the basis ofa corresponding PU for the CU, as previously described. For example, thePU indicates a partition of the CU for which predicted data is to bedetermined, and a prediction mode used to determine the predicted data.The predictor can be derived either through intra-(I) prediction (i.e.,spatial prediction) or inter-(P or B) prediction (i.e., temporalprediction) modes. Hence, some CUs may be intra-coded (I) using spatialprediction with respect to neighboring reference blocks, or CUs, in thesame frame, while other CUs may be inter-coded (P or B) with respect toreference blocks, or CUs, in other frames.

Upon identification of the one or more predictors based on the one ormore PUs of the CU, a difference between the original video data of theCU corresponding to the one or more PUs and the predicted data for theCU contained in the one or more predictors is calculated. Thisdifference, also referred to as a prediction residual, comprisesresidual coefficients, and refers to pixel differences between portionsof the CU specified by the one or more PUs and the one or morepredictors, as previously described. The residual coefficients aregenerally arranged in a two-dimensional (2-D) array that corresponds tothe one or more PUs o the CU.

To achieve further compression, the prediction residual is generallytransformed, e.g., using a discrete cosine transform (DCT), integertransform, Karhunen-Loeve (K-L) transform, or another transform. Thetransform converts the prediction residual, i.e., the residualcoefficients, in the spatial domain to residual transform coefficientsin the transform domain, e.g., a frequency domain, as also previouslydescribed. In some occasions the transform is skipped, i.e., notransform is applied to the prediction residual. Transform skippedcoefficients are also referred as transform coefficients. The transformcoefficients (including transform skip coefficients) are also generallyarranged in a 2-D array that corresponds to the one or more TUs of theCU. For further compression, the residual transform coefficients may bequantized to possibly reduce the amount of data used to represent thecoefficients, as also previously described.

To achieve still further compression, an entropy coder subsequentlyencodes the resulting residual transform coefficients, using ContextAdaptive Variable Length Coding (CAVLC), Context Adaptive BinaryArithmetic Coding (CABAC), Probability Interval Partitioning EntropyCoding (PIPE), or another entropy coding methodology. Entropy coding mayachieve this further compression by reducing or removing statisticalredundancy inherent in the video data of the CU, represented by thecoefficients, relative to other CUs.

A video sequence typically includes a series of video frames orpictures. A group of pictures (GOP) generally comprises a series of oneor more of the video pictures. A GOP may include syntax data in a headerof the GOP, a header of one or more of the pictures, or elsewhere, thatdescribes a number of pictures included in the GOP. Each slice of apicture may include slice syntax data that describes an encoding modefor the respective slice. Video encoder 32 typically operates on videoblocks within individual video slices in order to encode the video data.A video block may correspond to a coding node within a CU (e.g., atransform block of transform coefficients). The video blocks may havefixed or varying sizes, and may differ in size according to a specifiedcoding standard.

As an example, the HM supports prediction in various PU sizes. Assumingthat the size of a particular CU is 2N×2N, the HM supportsintra-prediction in PU sizes of 2N×2N or N×N, and inter-prediction insymmetric PU sizes of 2N×2N, 2N×N, N×2N, or N×N. The HM also supportsasymmetric partitioning for inter-prediction in PU sizes of 2N×nU,2N×nD, nL×2N, and nR×2N. In asymmetric partitioning, one direction of aCU is not partitioned, while the other direction is partitioned into 25%and 75%. The portion of the CU corresponding to the 25% partition isindicated by an “n” followed by an indication of “Up”, “Down,” “Left,”or “Right.” Thus, for example, “2N×nU” refers to a 2N×2N CU that ispartitioned horizontally with a 2N×0.5N PU on top and a 2N×1.5N PU onbottom.

In this disclosure, “N×N” and “N by N” may be used interchangeably torefer to the pixel dimensions of a video block in terms of vertical andhorizontal dimensions, e.g., 16×16 pixels or 16 by 16 pixels. Ingeneral, a 16×16 block will have 16 pixels in a vertical direction(y=16) and 16 pixels in a horizontal direction (x=16). Likewise, an N×Nblock generally has N pixels in a vertical direction and N pixels in ahorizontal direction, where N represents a nonnegative integer value.The pixels in a block may be arranged in rows and columns. Moreover,blocks need not necessarily have the same number of pixels in thehorizontal direction as in the vertical direction. For example, blocksmay comprise N×M pixels, where M is not necessarily equal to N.

Following intra-predictive or inter-predictive encoding using the PUs ofa CU, video encoder 32 may calculate residual data for the TUs of theCU. The PUs may comprise pixel data in the spatial domain (also referredto as the pixel domain) and the TUs may comprise coefficients in thetransform domain following application of a transform, e.g., a discretecosine transform (DCT), an integer transform, a wavelet transform, skiptransform, or a conceptually similar transform to residual video data.The residual data may correspond to pixel differences between pixels ofthe unencoded picture and prediction values corresponding to the PUs.Video encoder 32 may form the TUs including the residual data for theCU, and then transform the TUs to produce transform coefficients for theCU.

Following any transforms to produce transform coefficients, videoencoder 32 may perform quantization of the transform coefficients.Quantization generally refers to a process in which transformcoefficients are quantized to possibly reduce the amount of data used torepresent the coefficients, providing further compression. Thequantization process may reduce the bit depth associated with some orall of the coefficients. For example, an n-bit value may be rounded downto an m-bit value during quantization, where n is greater than m.

In some examples, video encoder 32 may utilize a predefined scan order(e.g., horizontal, vertical, or diagonal) to scan the quantizedtransform coefficients to produce a serialized vector that can beentropy encoded. In some examples, video encoder 32 may perform anadaptive scan. After scanning the quantized transform coefficients toform a one-dimensional vector, video encoder 32 may entropy encode theone-dimensional vector, e.g., according to context adaptive variablelength coding (CAVLC), context adaptive binary arithmetic coding(CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC),Probability Interval Partitioning Entropy (PIPE) coding or anotherentropy encoding methodology. Video encoder 32 may also entropy encodesyntax elements associated with the encoded video data for use by videodecoder 42 in decoding the video data.

To perform CABAC, video encoder 32 may assign a context within a contextmodel to a symbol to be transmitted. The context may relate to, forexample, whether neighboring values of the symbol are non-zero or not.To perform CAVLC, video encoder 32 may select a variable length code fora symbol to be transmitted. Codewords in VLC may be constructed suchthat relatively shorter codes correspond to more probable symbols, whilelonger codes correspond to less probable symbols. In this way, the useof VLC may achieve a bit savings over, for example, using equal-lengthcodewords for each symbol to be transmitted. The probabilitydetermination may be based on a context assigned to the symbol.

Video decoder 42 may be configured to implement the reciprocal of theencoding techniques implemented by video encoder 32. For example, forthe encoded significance syntax elements, video decoder 42 may decodethe significance syntax elements by determining which contexts to usebased on the determined scan order.

For instance, video encoder 32 signals syntax elements that indicate thevalues of the transform coefficients. Video encoder 32 generates thesesyntax elements in five passes, as one example, and using five passes isnot necessary in every example. Video encoder 32 determines the locationof the last significant coefficient and begins the first pass from thelast significant coefficient. After the first pass, video encoder 32implements the remaining four passes only on those transformcoefficients remaining from the previous pass. In the first pass, videoencoder 32 scans the transform coefficients using one of the scan ordersillustrated in FIGS. 1A-1C and determines a significance syntax elementfor each transform coefficient that indicates whether the value for thetransform coefficient is zero or non-zero (i.e., insignificant orsignificant).

In the second pass, referred to as a greater than one pass, videoencoder 32 generates syntax elements to indicate whether the absolutevalue of a significant coefficient is larger than one. In a similarmanner, in the third pass, referred to as the greater than two pass,video encoder 32 generates syntax elements to indicate whether theabsolute value of a greater than one coefficient is larger than two.

In the fourth pass, referred to as a sign pass, video encoder 32generates syntax elements to indicate the sign information forsignificant coefficients. In the fifth pass, referred to as acoefficient level remaining pass, video encoder 32 generates syntaxelements that indicate the remaining absolute value of a transformcoefficient level (e.g., the remainder value). The remainder value maybe coded as the absolute value of the coefficient minus 3. It should benoted that the five pass approach is just one example technique that maybe used for coding transform coefficient and the techniques describedherein may be equally applicable to other techniques.

In the techniques described in this disclosure, video encoder 32 encodesthe significance syntax elements using context adaptive binaryarithmetic coding (CABAC). In accordance with the techniques describedin this disclosure, video encoder 32 may determine a scan order for thetransform coefficients of the block, and determine contexts for thesignificance syntax elements of the transform coefficients of the blockbased on the determined scan order. Video encoder 32 may CABAC encodethe significance syntax elements based on the determined contexts, andsignal the encoded significance syntax elements in the coded bitstream.

Video decoder 42 may be configured to perform similar functions. Forexample, video decoder 42 receives from the coded bitstream significancesyntax elements of transform coefficients of a block. Video decoder 42may determine a scan order for the transform coefficients of the block(e.g., an order in which video encoder 32 scanned the transformcoefficients). Video decoder 42 may then CABAC decode the significancesyntax elements of the transform coefficients based at least one thedetermined contexts.

In some examples, video encoder 32 and video decoder 42 each determinesthe contexts that are the same if the determined scan order is ahorizontal scan and if the determined scan order is a vertical scan, anddetermines the contexts, which are different than the contexts for thehorizontal scan and vertical scan, if the determined scan order is adiagonal scan. In general, video encoder 32 and video decoder 42 mayeach determine a first set of contexts for the significance syntaxelements if the scan order is a first scan order, and determine a secondset of contexts for the significance syntax elements if the scan orderis a second scan order. The first set of contexts and the second set ofcontexts may be same in some cases (e.g., where the first scan order isa horizontal scan and the second scan order is a vertical scan, orvice-versa). The first set of contexts and the second set of contextsmay be different in some cases (e.g., where the first scan order iseither a horizontal or a vertical scan and the second scan order is nota horizontal or a vertical scan).

In some examples, video encoder 32 and video decoder 42 also determine asize of the block. In some of these examples, video encoder 32 and videodecoder 42 determine the contexts for the significance syntax elementsbased on the determined scan order and based on the determined size ofthe block. For example, to determine the contexts, video encoder 32 andvideo decoder 42 may determine, based on the size of the block, that thecontexts for the significance syntax elements of the transformcoefficients that are the same for all scan orders. In other words, forcertain sized blocks, video encoder 32 and video decoder 42 maydetermine contexts that are the same for all scan orders.

In some examples, the techniques described in this disclosure may buildupon the concepts of sub-block horizontal and vertical scans, such asthose described in: (1) Rosewarne, C., Maeda, M. “Non-CE11:Harmonisation of 8×8 TU residual scan” JCT-VC Contribution JCTVC-H0145;(2) Yu, Y., Panusopone, K., Lou, J., Wang, L. “Adaptive Scan for LargeBlocks for HEVC; JCT-VC Contribution JCTVC-F569; and (3) U.S. patentapplication Ser. No. 13/551,458, filed Jul. 17, 2012. For instance, thetechniques described in this disclosure provide for improvement in thecoding of significance syntax elements and harmonization acrossdifferent scan orders and block (e.g., TU) sizes.

For example, as described above, a 4×4 block may be a sub-block of alarger block. In the techniques described in this disclosure, relativelylarge sized blocks (e.g., 16×16 or 32×32) may be divided into 4×4sub-blocks, and video encoder 32 and video decoder 42 may be configuredto determine the contexts for the 4×4 sub-blocks based on the scanorder. In some examples, such techniques may be extendable to 8×8 sizedblocks as well as for all scan orders (i.e., the 4×4 sub-blocks of the8×8 block can be scanned horizontally, vertically, or diagonally). Suchtechniques may also allow for context sharing between the different scanorders.

In some examples, video encoder 32 and video decoder 42 determinecontexts that are the same for all block sizes if the scan order is adiagonal scan (i.e., the contexts are shared for all of the TUs whenusing the diagonal scan). In this example, video encoder 32 and videodecoder 42 may determine another set of contexts that are the same forthe horizontal and vertical scan, which allows for context sharingdepending on the scan order.

In some examples, there may be three sets of contexts: one forrelatively large blocks, one for the diagonal scan of the 8×8 block orthe 4×4 block, and one for both horizontal and vertical scans of the 8×8block or the 4×4 block, where the contexts for the 8×8 block and the 4×4block are different. Other combinations and permutations of the sizesand the scan orders may be possible, and video encoder 32 and videodecoder 42 may be configured to determine contexts that are the same forthese various combinations and permutations of sizes and scan orders.

FIG. 4 is a block diagram illustrating an example video encoder 32 thatmay implement the techniques described in this disclosure. In theexample of FIG. 4, video encoder 32 includes a mode select unit 46,prediction processing unit 48, reference picture memory 70, summer 56,transform processing unit 58, quantization processing unit 60, andentropy encoding unit 62. Prediction processing unit 48 includes motionestimation unit 50, motion compensation unit 52, and intra predictionunit 54. For video block reconstruction, video encoder 32 also includesinverse quantization processing unit 64, inverse transform processingunit 66, and summer 68. A deblocking filter (not shown in FIG. 4) mayalso be included to filter block boundaries to remove blockinessartifacts from reconstructed video. If desired, the deblocking filterwould typically filter the output of summer 68. Additional loop filters(in loop or post loop) may also be used in addition to the deblockingfilter. It should be noted that prediction processing unit 48 andtransform processing unit 58 should not be confused with PUs and TUs asdescribed above.

As shown in FIG. 4, video encoder 32 receives video data, and modeselect unit 46 partitions the data into video blocks. This partitioningmay also include partitioning into slices, tiles, or other larger units,as well as video block partitioning, e.g., according to a quadtreestructure of LCUs and CUs. Video encoder 32 generally illustrates thecomponents that encode video blocks within a video slice to be encoded.A slice may be divided into multiple video blocks (and possibly intosets of video blocks referred to as tiles). Prediction processing unit48 may select one of a plurality of possible coding modes, such as oneof a plurality of intra coding modes or one of a plurality of intercoding modes, for the current video block based on error results (e.g.,coding rate and the level of distortion). Prediction processing unit 48may provide the resulting intra- or inter-coded block to summer 56 togenerate residual block data and to summer 68 to reconstruct the encodedblock for use as a reference picture.

Intra prediction unit 54 within prediction processing unit 48 mayperform intra-predictive coding of the current video block relative toone or more neighboring blocks in the same frame or slice as the currentblock to be coded to provide spatial compression. Motion estimation unit50 and motion compensation unit 52 within prediction processing unit 48perform inter-predictive coding of the current video block relative toone or more predictive blocks in one or more reference pictures toprovide temporal compression.

Motion estimation unit 50 may be configured to determine theinter-prediction mode for a video slice according to a predeterminedpattern for a video sequence. The predetermined pattern may designatevideo slices in the sequence as P slices or B slices. Motion estimationunit 50 and motion compensation unit 52 may be highly integrated, butare illustrated separately for conceptual purposes. Motion estimation,performed by motion estimation unit 50, is the process of generatingmotion vectors, which estimate motion for video blocks. A motion vector,for example, may indicate the displacement of a PU of a video blockwithin a current video frame or picture relative to a predictive blockwithin a reference picture.

A predictive block is a block that is found to closely match the PU ofthe video block to be coded in terms of pixel difference, which may bedetermined by sum of absolute difference (SAD), sum of square difference(SSD), or other difference metrics. In some examples, video encoder 32may calculate values for sub-integer pixel positions of referencepictures stored in reference picture memory 70. For example, videoencoder 32 may interpolate values of one-quarter pixel positions,one-eighth pixel positions, or other fractional pixel positions of thereference picture. Therefore, motion estimation unit 50 may perform amotion search relative to the full pixel positions and fractional pixelpositions and output a motion vector with fractional pixel precision.

Motion estimation unit 50 calculates a motion vector for a PU of a videoblock in an inter-coded slice by comparing the position of the PU to theposition of a predictive block of a reference picture. The referencepicture may be selected from a first reference picture list (List 0) ora second reference picture list (List 1), each of which identify one ormore reference pictures stored in reference picture memory 70. Motionestimation unit 50 sends the calculated motion vector to entropyencoding unit 62 and motion compensation unit 52.

Motion compensation, performed by motion compensation unit 52, mayinvolve fetching or generating the predictive block based on the motionvector determined by motion estimation, possibly performinginterpolations to sub-pixel precision. Upon receiving the motion vectorfor the PU of the current video block, motion compensation unit 52 maylocate the predictive block to which the motion vector points in one ofthe reference picture lists. Video encoder 32 forms a residual videoblock by subtracting pixel values of the predictive block from the pixelvalues of the current video block being coded, forming pixel differencevalues. The pixel difference values form residual data for the block,and may include both luma and chroma difference components. Summer 56represents the component or components that perform this subtractionoperation. Motion compensation unit 52 may also generate syntax elementsassociated with the video blocks and the video slice for use by videodecoder 42 in decoding the video blocks of the video slice.

Intra-prediction unit 54 may intra-predict a current block, as analternative to the inter-prediction performed by motion estimation unit50 and motion compensation unit 52, as described above. In particular,intra-prediction unit 54 may determine an intra-prediction mode to useto encode a current block. In some examples, intra-prediction unit 54may encode a current block using various intra-prediction modes, e.g.,during separate encoding passes, and intra-prediction unit 54 (or modeselect unit 46, in some examples) may select an appropriateintra-prediction mode to use from the tested modes. For example,intra-prediction unit 54 may calculate rate-distortion values using arate-distortion analysis for the various tested intra-prediction modes,and select the intra-prediction mode having the best rate-distortioncharacteristics among the tested modes. Rate-distortion analysisgenerally determines an amount of distortion (or error) between anencoded block and an original, unencoded block that was encoded toproduce the encoded block, as well as a bit rate (that is, a number ofbits) used to produce the encoded block. Intra-prediction unit 54 maycalculate ratios from the distortions and rates for the various encodedblocks to determine which intra-prediction mode exhibits the bestrate-distortion value for the block.

In any case, after selecting an intra-prediction mode for a block,intra-prediction unit 54 may provide information indicative of theselected intra-prediction mode for the block to entropy encoding unit62. Entropy encoding unit 62 may encode the information indicating theselected intra-prediction mode in accordance with the entropy techniquesdescribed herein.

After prediction processing unit 48 generates the predictive block forthe current video block via either inter-prediction or intra-prediction,video encoder 32 forms a residual video block by subtracting thepredictive block from the current video block. The residual video datain the residual block may be included in one or more TBs and applied totransform processing unit 58. Transform processing unit 58 may transformthe residual video data into residual transform coefficients using atransform, such as a discrete cosine transform (DCT) or a conceptuallysimilar transform. Transform processing unit 58 may convert the residualvideo data from a pixel domain to a transform domain, such as afrequency domain. In some cases, transform processing unit 58 may applya 2-dimensional (2-D) transform (in both the horizontal and verticaldirection) to the residual data in the TBs. In some examples, transformprocessing unit 58 may instead apply a horizontal 1-D transform, avertical 1-D transform, or no transform to the residual data in each ofthe TBs.

Transform processing unit 58 may send the resulting transformcoefficients to quantization processing unit 60. Quantization processingunit 60 quantizes the transform coefficients to further reduce the bitrate. The quantization process may reduce the bit depth associated withsome or all of the coefficients. The degree of quantization may bemodified by adjusting a quantization parameter. In some examples,quantization processing unit 60 may then perform a scan of the matrixincluding the quantized transform coefficients. Alternatively, entropyencoding unit 62 may perform the scan.

As described above, the scan performed on a transform block may be basedon the size of the transform block. Quantization processing unit 60and/or entropy encoding unit 62 may scan 8×8, 16×16, and 32×32 transformblocks using any combination of the sub-block scans described above withrespect to FIGS. 1A-1C. When more one than one scan is available for atransform block, entropy encoding unit 62 may determine a scan orderbased on a coding parameter associated with the transform block, such asa prediction mode associated with a prediction unit corresponding to thetransform block. Further details with respect to entropy encoding unit62 are described below with respect to FIG. 5.

Inverse quantization processing unit 64 and inverse transform processingunit 66 apply inverse quantization and inverse transformation,respectively, to reconstruct the residual block in the pixel domain forlater use as a reference block of a reference picture. Motioncompensation unit 52 may calculate a reference block by adding theresidual block to a predictive block of one of the reference pictureswithin one of the reference picture lists. Motion compensation unit 52may also apply one or more interpolation filters to the reconstructedresidual block to calculate sub-integer pixel values for use in motionestimation. Summer 68 adds the reconstructed residual block to themotion compensated prediction block produced by motion compensation unit52 to produce a reference block for storage in reference picture memory70. The reference block may be used by motion estimation unit 50 andmotion compensation unit 52 as a reference block to inter-predict ablock in a subsequent video frame or picture.

Following quantization, entropy encoding unit 62 entropy encodes thequantized transform coefficients. For example, entropy encoding unit 62may perform context adaptive variable length coding (CAVLC), contextadaptive binary arithmetic coding (CABAC), syntax-based context-adaptivebinary arithmetic coding (SBAC), probability interval partitioningentropy (PIPE) coding or another entropy encoding methodology ortechnique. Following the entropy encoding by entropy encoding unit 62,the encoded bitstream may be transmitted to video decoder 42, orarchived for later transmission or retrieval by video decoder 42.Entropy encoding unit 62 may also entropy encode the motion vectors andthe other syntax elements for the current video slice being coded.Entropy encoding unit 62 may entropy encode syntax elements such as thesignificance syntax elements and the other syntax elements for thetransform coefficients described above using CABAC.

In some examples, entropy encoding unit 62 may be configured toimplement the techniques described in this disclosure of determiningcontexts based on a determined scan order. In some examples, entropyencoding unit 62 in conjunction with one or more units within videoencoder 32 may be configured to implement the techniques described inthis disclosure. In some examples, a processor or processing unit (notshown) of video encoder 32 may be configured to implement the techniquesdescribed in this disclosure.

FIG. 5 is a block diagram that illustrates an example entropy encodingunit 62 that may implement the techniques described in this disclosure.The entropy encoding unit 62 illustrated in FIG. 5 may be a CABACencoder. The example entropy encoding unit 62 may include a binarizationunit 72, an arithmetic encoding unit 80, which includes a bypassencoding engine 74 and a regular encoding engine 78, and a contextmodeling unit 76.

Entropy encoding unit 62 may receive one or more syntax elements, suchas the significance syntax element, referred to as a significantcoefficient_flag in HEVC, the greater than 1 flag, referred to as acoeff_abs_level_greater1 flag in HEVC, the greater than 2 flag, referredto as coeff_abs_level_greater2 flag in HEVC, the sign flag referred toas coeff_sign_flag in HEVC, and the level syntax element referred to ascoeff_abs_level_remain. Binarization unit 72 receives a syntax elementand produces a bin string (i.e., binary string). Binarization unit 72may use, for example, any one or combination of the following techniquesto produce a bin string: fixed length coding, unary coding, truncatedunary coding, truncated Rice coding, Golomb coding, exponential Golombcoding, and Golomb-Rice coding. Further, in some cases, binarizationunit 72 may receive a syntax element as a binary string and simplypass-through the bin values. In one example, binarization unit 72receives the significance syntax element and produces a bin string.

Arithmetic encoding unit 80 is configured to receive a bin string frombinarization unit 72 and perform arithmetic encoding on the bin string.As shown in FIG. 5, arithmetic encoding unit 80 may receive bin valuesfrom a bypass path or the regular coding path. Bin values that followthe bypass path may be bins values identified as bypass coded and binvalues that follow the regular encoding path may be identified asCABAC-coded. Consistent with the CABAC process described above, in thecase where arithmetic encoding unit 80 receives bin values from a bypasspath, bypass encoding engine 74 may perform arithmetic encoding on binvalues without utilizing an adaptive context assigned to a bin value. Inone example, bypass encoding engine 74 may assume equal probabilitiesfor possible values of a bin.

In the case where arithmetic encoding unit 80 receives bin valuesthrough the regular path, context modeling unit 76 may provide a contextvariable (e.g., a context state), such that regular encoding engine 78may perform arithmetic encoding based on the context assignmentsprovided by context modeling unit 76. The context assignments may bedefined according to a video coding standard, such as the HEVC standard.Further, in one example context modeling unit 76 and/or entropy encodingunit 62 may be configured to determine contexts for bins of thesignificance syntax elements based on techniques described herein. Thetechniques may be incorporated into HEVC or another video codingstandard. The context models may be stored in memory. Context modelingunit 76 may include a series of indexed tables and/or utilize mappingfunctions to determine a context and a context variable for a particularbin. After encoding a bin value, regular encoding engine 78 may update acontext based on the actual bin values.

FIG. 6 is a flowchart illustrating an example process for encoding videodata according to this disclosure. Although the process in FIG. 6 isdescribed below as generally being performed by video encoder 32, theprocess may be performed by any combination of video encoder 32, entropyencoding unit 62, and/or context modeling unit 76.

As illustrated, video encoder 32 may determine a scan order fortransform coefficients of a block (82). Video encoder 32 may determinecontexts for the transform coefficients based on the scan order (84). Insome examples, video encoder 32 determines the contexts based on thedetermined scan order, positions of the transform coefficients with theblock, and a size of the block. For example, for a particular block size(e.g., an 8×8 block of transform coefficients) and a particular position(e.g., transform coefficient position), video encoder 32 may determinethe same context if the scan order is either horizontal scan or verticalscan, and determine a different context if the scan order in not thehorizontal scan or the vertical scan.

Video encoder 32 may CABAC encode significance syntax elements (e.g.,significance flags) for the transform coefficients based on thedetermined contexts (86). Video encoder 32 may signal the encodedsignificance syntax elements (e.g., significance flags) (88).

FIG. 7 is a block diagram illustrating an example video decoder 42 thatmay implement the techniques described in this disclosure. In theexample of FIG. 7, video decoder 42 includes an entropy decoding unit90, prediction processing unit 92, inverse quantization processing unit98, inverse transform processing unit 100, summer 102, and referencepicture memory 104. Prediction processing unit 92 includes motioncompensation unit 94 and intra prediction unit 96. Video decoder 42 may,in some examples, perform a decoding pass generally reciprocal to theencoding pass described with respect to video encoder 32 from FIG. 4.

During the decoding process, video decoder 42 receives an encoded videobitstream that represents video blocks of an encoded video slice andassociated syntax elements from video encoder 32. Entropy decoding unit90 of video decoder 42 entropy decodes the bitstream to generatequantized coefficients, motion vectors, and other syntax elements.Entropy decoding unit 90 forwards the motion vectors and other syntaxelements to prediction processing unit 92. Video decoder 42 may receivethe syntax elements at the video slice level and/or the video blocklevel.

In some examples, entropy decoding unit 90 may be configured toimplement the techniques described in this disclosure of determiningcontexts based on a determined scan order. In some examples, entropydecoding unit 90 in conjunction with one or more units within videodecoder 42 may be configured to implement the techniques described inthis disclosure. In some examples, a processor or processing unit (notshown) of video decoder 42 may be configured to implement the techniquesdescribed in this disclosure.

FIG. 8 is a block diagram that illustrates an example entropy decodingunit 90 that may implement the techniques described in this disclosure.Entropy decoding unit 90 receives an entropy encoded bitstream anddecodes syntax elements from the bitstream. Syntax elements may includethe syntax elements such as significant_coefficient_flag,coeff_abs_level_remain, coeff_abs_level_greater1 flag,coeff_abs_level_greater2 flag, and coeff_sign_flag, syntax elementsdescribed above for transform coefficients of a block. The exampleentropy decoding unit 90 in FIG. 8 includes an arithmetic decoding unit106, which may include a bypass decoding engine 108 and a regulardecoding engine 110. The example entropy decoding unit 90 also includescontext modeling unit 112 and inverse binarization unit 114. The exampleentropy decoding unit 90 may perform the reciprocal functions of theexample entropy encoding unit 62 described with respect to FIG. 5. Inthis manner, entropy decoding unit 90 may perform entropy decoding basedon the techniques described in this disclosure.

Arithmetic decoding unit 106 receives an encoded bit stream. As shown inFIG. 8, arithmetic decoding unit 106 may process encoded bin valuesaccording to a bypass path or the regular coding path. An indicationwhether an encoded bin value should be processed according to a bypasspath or a regular pass may be signaled in the bitstream with higherlevel syntax. Consistent with the CABAC process described above, in thecase where arithmetic decoding unit 106 receives bin values from abypass path, bypass decoding engine 108 may perform arithmetic encodingon bin values without utilizing a context assigned to a bin value. Inone example, bypass decoding engine 108 may assume equal probabilitiesfor possible values of a bin.

In the case where arithmetic decoding unit 106 receives bin valuesthrough the regular path, context modeling unit 112 may provide acontext variable, such that regular decoding engine 110 may performarithmetic encoding based on the context assignments provided by contextmodeling unit 112. The context assignments may be defined according to avideo coding standard, such as HEVC. The context models may be stored inmemory. Context modeling unit 112 may include a series of indexed tablesand/or utilize mapping functions to determine a context and a contextvariable portion of an encoded bitstream. Further, in one examplecontext modeling unit 112 and/or entropy decoding unit 90 may beconfigured to assign contexts to bins of the significance syntaxelements based on techniques described herein. After decoding a binvalue, regular decoding engine 110, may update a context based on thedecoded bin values. Further, inverse binarization unit 114 may performan inverse binarization on a bin value and use a bin matching functionto determine if a bin value is valid. The inverse binarization unit 114may also update the context modeling unit based on the matchingdetermination. Thus, the inverse binarization unit 114 outputs syntaxelements according to a context adaptive decoding technique.

Referring back to FIG. 7, when the video slice is coded as anintra-coded (I) slice, intra prediction unit 96 of prediction processingunit 92 may generate prediction data for a video block of the currentvideo slice based on a signaled intra prediction mode and data frompreviously decoded blocks of the current frame or picture. When thevideo frame is coded as an inter-coded (i.e., B or P) slice, motioncompensation unit 94 of prediction processing unit 92 producespredictive blocks for a video block of the current video slice based onthe motion vectors and other syntax elements received from entropydecoding unit 90. The predictive blocks may be produced from one of thereference pictures within one of the reference picture lists. Videodecoder 42 may construct the reference picture lists, List 0 and List 1,using default construction techniques based on reference pictures storedin reference picture memory 104.

Motion compensation unit 94 determines prediction information for avideo block of the current video slice by parsing the motion vectors andother syntax elements, and uses the prediction information to producethe predictive blocks for the current video block being decoded. Forexample, motion compensation unit 94 uses some of the received syntaxelements to determine a prediction mode (e.g., intra- orinter-prediction) used to code the video blocks of the video slice, aninter-prediction slice type (e.g., B slice or P slice), constructioninformation for one or more of the reference picture lists for theslice, motion vectors for each inter-encoded video block of the slice,inter-prediction status for each inter-coded video block of the slice,and other information to decode the video blocks in the current videoslice.

Motion compensation unit 94 may also perform interpolation based oninterpolation filters. Motion compensation unit 94 may use interpolationfilters as used by video encoder 32 during encoding of the video blocksto calculate interpolated values for sub-integer pixels of referenceblocks. In this case, motion compensation unit 94 may determine theinterpolation filters used by video encoder 32 from the received syntaxelements and use the interpolation filters to produce predictive blocks.

Inverse quantization processing unit 98 inverse quantizes, i.e.,de-quantizes, the quantized transform coefficients provided in thebitstream and decoded by entropy decoding unit 90. The inversequantization process may include use of a quantization parametercalculated by video encoder 32 for each video block in the video sliceto determine a degree of quantization and, likewise, a degree of inversequantization that should be applied. Inverse transform processing unit100 applies an inverse transform, e.g., an inverse DCT, an inverseinteger transform, or a conceptually similar inverse transform process,to the transform coefficients in order to produce residual blocks in thepixel domain.

In some cases, inverse transform processing unit 100 may apply a2-dimensional (2-D) inverse transform (in both the horizontal andvertical direction) to the coefficients. In some examples, inversetransform processing unit 88 may instead apply a horizontal 1-D inversetransform, a vertical 1-D inverse transform, or no transform to theresidual data in each of the TUs. The type of transform applied to theresidual data at video encoder 32 may be signaled to video decoder 42 toapply an appropriate type of inverse transform to the transformcoefficients.

After motion compensation unit 94 generates the predictive block for thecurrent video block based on the motion vectors and other syntaxelements, video decoder 42 forms a decoded video block by summing theresidual blocks from inverse transform processing unit 100 with thecorresponding predictive blocks generated by motion compensation unit94. Summer 102 represents the component or components that perform thissummation operation. If desired, a deblocking filter may also be appliedto filter the decoded blocks in order to remove blockiness artifacts.Other loop filters (either in the coding loop or after the coding loop)may also be used to smooth pixel transitions, or otherwise improve thevideo quality. The decoded video blocks in a given frame or picture arethen stored in reference picture memory 104, which stores referencepictures used for subsequent motion compensation. Reference picturememory 104 also stores decoded video for later presentation on a displaydevice, such as display device 44 of FIG. 3.

FIG. 9 is a flowchart illustrating an example process for decoding videodata according to this disclosure. Although the process in FIG. 9 isdescribed below as generally being performed by video decoder 42, theprocess may be performed by any combination of video decoder 42, entropydecoding unit 90, and/or context modeling unit 112.

As illustrated in FIG. 9, video decoder 42 receives, from a codedbitstream, significance syntax elements (e.g., significance flags) fortransform coefficients of a block (116). Video decoder 42 determines ascan order for the transform coefficients (118). Video decoder 42determines contexts for the transform coefficients based on thedetermined scan order (120). In some examples, video decoder 42 alsodetermines the block size and determines the contexts based on thedetermined scan order and block size. In some examples, video decoder 42determines the contexts based on the determined scan order, positions ofthe transform coefficients with the block, and a size of the block. Forexample, for a particular block size (e.g., an 8×8 block of transformcoefficients) and a particular position (e.g., transform coefficientposition), video decoder 42 may determine the same context if the scanorder is either horizontal scan or vertical scan, and determine adifferent context if the scan order in not the horizontal scan or thevertical scan. Video decoder 42 CABAC decodes the significance syntaxelements (e.g., significance flags) based on the determined contexts(122).

Video encoder 32, as described in the flowchart of FIG. 6, and videodecoder 42, as described in the flowchart of FIG. 9, may be configuredto implement various other example techniques described in thisdisclosure. For example, to determine the contexts, video encoder 32 andvideo decoder 42 may be configured to determine the contexts that arethe same if the determined scan order is a horizontal scan and if thedetermined scan order is a vertical scan, and determine the contexts,which are different than the contexts if the determined scan order isthe horizontal scan and if the determined scan order is the verticalscan, if the determined scan order is not the horizontal scan or thevertical scan (e.g., diagonal scan).

In some examples, to determine the contexts, video encoder 32 and videodecoder 42 may be configured to determine a first set of contexts forthe significance syntax elements if the scan order is a first scanorder, and determine a second set of contexts for the significancesyntax elements if the scan order is a second scan order. In some theseexamples, the first set of contexts is the same as the second set ofcontexts if the first scan order is a horizontal scan and the secondscan order is a vertical scan. In some of these examples, the first setof contexts is different than the second set of contexts if the firstscan order is one of a horizontal scan or a vertical scan and the secondscan order is not the horizontal scan or the vertical scan.

In some examples, video encoder 32 and video decoder 42 may determine asize of the block. In some of these examples, video encoder 32 and videodecoder 42 may determine the contexts based on the scan order and thedetermined size of the block. As one example, video encoder 32 and videodecoder 42 may determine, based on the determined size of the block, thecontexts for the significance syntax elements of the transformcoefficients that are the same for all scan orders (i.e., for some blocksizes, the contexts are the same for all scan orders).

For example, video encoder 32 and video decoder 42 may determine whetherthe size of the block is a first size or a second size. One example ofthe first size is the 4×4 block, and one example of the second size isthe 8×8 block. If the size of the block is the first size (e.g., the 4×4block), video encoder 32 and video decoder 42 may determine the contextsthat are the same for all scan orders (e.g., the contexts that are thesame for the diagonal, horizontal, and vertical scans for the 4×4block). If the size of the block is the second size (e.g., the 8×8block), video encoder 32 and video decoder 42 may determine the contextsthat are different for at least two different scan orders (e.g., thecontexts for the diagonal scan of the 8×8 block is different than thecontexts for the horizontal or vertical scan of the 8×8 block, but thecontexts for the horizontal and vertical scan of the 8×8 block may bethe same).

The following describes various additional techniques for improving themanner in which transform coefficients are coded, such as transformcoefficients resulting from intra-coding, as one example. However, thetechniques may be applicable to other examples as well, such as forinter-coding. The following techniques can be used individually or inconjunction with any of the other techniques described in thisdisclosure. Moreover, the techniques described above may be used inconjunction with any of the following techniques, or may be implementedseparately from any of the following techniques.

In some examples, video encoder 32 and video decoder 42 may utilize onescan order to determine the location of last significant coefficient.Video encoder 32 and video decoder 42 may utilize a different scan orderto determine neighborhood contexts for the transform coefficients. Videoencoder 32 and video decoder 42 may then code significance flags, levelinformation, and sign information based on the determined neighborhoodcontexts. For example, video encoder 32 and video decoder 42 may utilizea horizontal or vertical scan (referred to as the nominal scan) toidentify the last significant transform coefficient, and then utilize adiagonal scan on the 4×4 blocks or 4×4 sub-blocks (if 8×8 block) todetermine the neighborhood contexts.

In some examples, for 16×16 and 32×32 blocks, a neighborhood (in thetransform domain) of the current coefficient being processed is used forderivation of the context used to code the significance flag for thecoefficient. Similarly, in JCTVC-H0228, a neighborhood is used forcoding significance as well as level information for all block sizes.Using neighborhood-based contexts for 4×4 and 8×8 blocks may improve thecoding efficiency of HEVC. But if the existing significanceneighborhoods for significance maps from some other techniques are usedwith horizontal or vertical scans, the ability to derive contexts inparallel may be affected. Hence, in some examples, a scheme is describedwhich uses certain aspects of horizontal and vertical scans with theneighborhood used for significance coding from some other techniques.

This is accomplished as follows. In some examples, first the position ofthe last significant coefficient in the scan order is coded in thebit-stream. This is followed by the significance map for a subset of 16coefficients (a 4×4 sub-block in case of a 4×4 sub-block based diagonalscan) in backwards scan order, followed by coding passes for levelinformation and sign. It should be noted that the position of the lastsignificant coefficient depends directly on the specific scan that isused. An example of this is shown in FIG. 10.

FIG. 10 is a conceptual diagram illustrating positions of a lastsignificant coefficient depending on the scan order. FIG. 10 illustratesblock 124. The pixels shown with solid circles are significant. For ahorizontal scan, the position of the last significant position is (1, 2)in (row, column) format (transform coefficient 128). For a 4×4 subblockbased diagonal scan (up-right), the position of the last significantposition is (0, 3) (transform coefficient 126).

In this example, for horizontal or vertical scans, the last significantcoefficient position is still determined and coded based on the nominalscan. But then, for coding significance, level and sign information, theblock is scanned using a 4×4 sub-block based diagonal scan starting withthe bottom-right coefficient and proceeding backwards to the DCcoefficient. If it can be derived from the position of the lastsignificant coefficient that a particular coefficient is notsignificant, no significance, level or sign information is coded forthat coefficient.

Example of this approach is shown in FIG. 11 for a horizontal scan. FIG.11 is a conceptual diagram illustrating use of a diagonal scan in placeof an original horizontal scan. FIG. 11 illustrates block 130. Thecoefficients with solid fill are significant. The position of the lastsignificant position, assuming a horizontal scan, is (1, 1) (transformcoefficient 132). All coefficients with row indices greater than 1 canbe inferred to be not significant. Similarly, all coefficients with rowindex 1 and column index greater than 1 can be inferred to be notsignificant. Similarly, the coefficient (1, 1) can be inferred to besignificant. Its level and sign information cannot be inferred. Forcoding of significance, level and sign information, a backward 4×4sub-block based diagonal scan is used. Starting with the bottom rightcoefficient, the significance flags are encoded. The significance flagsthat can be inferred are not explicitly coded. A neighborhood basedcontext is used for coding of significance flags. The neighborhood maybe the same as that used for 16×16 and 32×32 blocks or a differentneighborhood may be used. It should be noted that, similar to above,separate sets of neighborhood-based contexts may be used for thedifferent scans (horizontal, vertical, and 4×4 sub-block). Also, thecontexts may be shared between different block sizes.

In another example, any of a various techniques, such as those ofJCTVC-H0228, may be used for coding significance, level and signinformation for 4×4 and 8×8 blocks after the position of the lastsignificant position is coded assuming the nominal scan. For coding ofsignificance, level and sign information, a 4×4 sub-block based diagonalscan may be used.

It should be noted that the method is not restricted to horizontal,vertical and 4×4 sub-block based diagonal scans. The basic principle isto send the last significant coefficient position assuming the nominalscan and then code the significance (and possibly level and sign)information using another scan which uses neighborhood based contexts.Similarly, although the techniques have been described for 4×4 and 8×8blocks, it can be extended to any block size where horizontal and/orvertical scans may be used.

In one example, rather than utilizing separate contexts for eachtransform coefficient based on its position in the transform block, thevideo coder (e.g., video encoder 32 or video decoder 42) may determinewhich context to use for coding a transform coefficient based on rowindex or the column index of the transform coefficient. For example, fora horizontal scan, all transform coefficients in the same row may sharethe same context, and the video coder may utilize different contexts fortransform coefficients in the different rows. For a vertical scan, alltransform coefficients in the same column may share the same context,and the video coder may utilize different contexts for transformcoefficients in the different columns.

Some other techniques may use multiple context sets based on coefficientposition for coding of significance maps for block sizes of 16×16 andhigher. Similarly, JCTVC-H0228(and also HM5.0) uses the sum of row andcolumn indices to determine the context set. In the case of JCTVC-H0228,this is done even for horizontal and vertical scans.

In some example techniques of this disclosure, the context set used tocode the significance or level for a particular coefficient forhorizontal scan may depend only on the row index of the coefficient.Similarly, the context set to code the significance or level for acoefficient in case of vertical scan may depend only on the column indexof the coefficient.

In some example techniques of this disclosure, the context set maydepend only on the absolute index of the coefficient in the scan.Different scans may use different functions to derive the context set.

Furthermore, as described above, horizontal, vertical and 4×4sub-block-based diagonal scans may use separate context sets or thehorizontal and vertical scans may share context sets. In some examples,not only the context set but also the context itself depends only on theabsolute index of the coefficient in the scanning order.

In some examples, the video coder (e.g., video encoder 32 or videodecoder 42) may be configured to implement only one type of scan (e.g.,a diagonal scan). However, the neighboring regions that the video coderevaluates may be based on the nominal scan. The nominal scan is the scanthe video coder would have performed had the video coder been able toperform other scans. For instance, video encoder 32 may signal that thehorizontal scan is to be used. However, video decoder 42 may implementthe diagonal scan instead, but the neighboring regions that the videocoder evaluates may be based on the signaling that the horizontal scanis to be used. The same would apply for the vertical scan.

In some examples, if the nominal scan is the horizontal scan, then thevideo coder may stretch the neighboring region that is evaluated in thehorizontal direction relative to the regions that are currently used.The same would apply when the nominal scan is the vertical scan, but inthe vertical direction. The stretching of the neighboring region may bereferred to as varying the region. For example, if the nominal scan ishorizontal, then rather than evaluating a transform coefficient that istwo rows down from where the current transform coefficient being codedis located, the video coder may evaluate the transform coefficient thatis three columns apart from where the current transform coefficient islocated. The same would apply when the nominal scan is the verticalscan, but the transform coefficient would be located three rows apartfrom where the current transform coefficient (e.g., the one being coded)is located

FIG. 12 is a conceptual diagram illustrating a context neighborhood fora nominal horizontal scan. FIG. 12 illustrates 8×8 block 134 thatincludes 4×4 sub-blocks 136A-136D. Compared to the context neighborhoodin some other techniques, the coefficient two rows down has beenreplaced by the coefficient that is in the same row but three columnsapart (X₄). Similarly, if the nominal scan is vertical, a contextneighborhood that is stretched in the vertical direction may be used.

In one or more examples, the functions described may be implemented inhardware, software, firmware, or any combination thereof. If implementedin software, the functions may be stored on or transmitted over, as oneor more instructions or code, a computer-readable medium and executed bya hardware-based processing unit. Computer-readable media may includecomputer-readable storage media, which corresponds to a tangible mediumsuch as data storage media, or communication media including any mediumthat facilitates transfer of a computer program from one place toanother, e.g., according to a communication protocol. In this manner,computer-readable media generally may correspond to (1) tangiblecomputer-readable storage media which is non-transitory or (2) acommunication medium such as a signal or carrier wave. Data storagemedia may be any available media that can be accessed by one or morecomputers or one or more processors to retrieve instructions, codeand/or data structures for implementation of the techniques described inthis disclosure. A computer program product may include acomputer-readable medium.

By way of example, and not limitation, such computer-readable storagemedia can comprise RAM, ROM, EEPROM, CD-ROM or other optical diskstorage, magnetic disk storage, or other magnetic storage devices, flashmemory, or any other medium that can be used to store desired programcode in the form of instructions or data structures and that can beaccessed by a computer. Also, any connection is properly termed acomputer-readable medium. For example, if instructions are transmittedfrom a website, server, or other remote source using a coaxial cable,fiber optic cable, twisted pair, digital subscriber line (DSL), orwireless technologies such as infrared, radio, and microwave, then thecoaxial cable, fiber optic cable, twisted pair, DSL, or wirelesstechnologies such as infrared, radio, and microwave are included in thedefinition of medium. It should be understood, however, thatcomputer-readable storage media and data storage media do not includeconnections, carrier waves, signals, or other transient media, but areinstead directed to non-transient, tangible storage media. Disk anddisc, as used herein, includes compact disc (CD), laser disc, opticaldisc, digital versatile disc (DVD), floppy disk and Blu-ray disc, wheredisks usually reproduce data magnetically, while discs reproduce dataoptically with lasers. Combinations of the above should also be includedwithin the scope of computer-readable media.

Instructions may be executed by one or more processors, such as one ormore digital signal processors (DSPs), general purpose microprocessors,application specific integrated circuits (ASICs), field programmablelogic arrays (FPGAs), or other equivalent integrated or discrete logiccircuitry. Accordingly, the term “processor,” as used herein may referto any of the foregoing structure or any other structure suitable forimplementation of the techniques described herein. In addition, in someaspects, the functionality described herein may be provided withindedicated hardware and/or software modules configured for encoding anddecoding, or incorporated in a combined codec. Also, the techniquescould be fully implemented in one or more circuits or logic elements.

The techniques of this disclosure may be implemented in a wide varietyof devices or apparatuses, including a wireless handset, an integratedcircuit (IC) or a set of ICs (e.g., a chip set). Various components,modules, or units are described in this disclosure to emphasizefunctional aspects of devices configured to perform the disclosedtechniques, but do not necessarily require realization by differenthardware units. Rather, as described above, various units may becombined in a codec hardware unit or provided by a collection ofinteroperative hardware units, including one or more processors asdescribed above, in conjunction with suitable software and/or firmware.

Various examples have been described. These and other examples arewithin the scope of the following claims.

What is claimed is:
 1. A method for decoding video data, the methodcomprising: receiving, from a coded bitstream, significance flags oftransform coefficients of a block; determining a scan order for thetransform coefficients of the block; determining contexts for thesignificance flags of the transform coefficients of the block based onthe determined scan order; and context adaptive binary arithmetic coding(CABAC) decoding the significance flags of the transform coefficientsbased at least on the determined contexts.
 2. The method of claim 1,wherein determining the contexts comprises determining the contextsbased on size of the block, positions of the transform coefficientswithin the block, and the scan order.
 3. The method of claim 1, whereindetermining the contexts comprises: determining the contexts that arethe same if the determined scan order is a horizontal scan and if thedetermined scan order is a vertical scan; and determining the contexts,which are different than the contexts if the determined scan order isthe horizontal scan and if the determined scan order is the verticalscan, if the determined scan order is not the horizontal scan or thevertical scan.
 4. The method of claim 1, wherein determining contextsfor the significance flags of the transform coefficients of the blockbased on the determined scan order comprises determining the samecontexts if the scan order is horizontal scan order or vertical scanorder.
 5. The method of claim 1, wherein determining the contextscomprises: determining a first set of contexts for the significanceflags if the scan order is a first scan order; and determining a secondset of contexts for the significance flags if the scan order is a secondscan order.
 6. The method of claim 5, wherein the first set of contextsis the same as the second set of contexts if the first scan order is ahorizontal scan and the second scan order is a vertical scan.
 7. Themethod of claim 5, wherein the first set of context is different thanthe second set of contexts if the first scan order is one of ahorizontal scan or a vertical scan and the second scan order is not thehorizontal scan or the vertical scan.
 8. The method of claim 1, whereindetermining the contexts comprises determining the contexts for thesignificance flags of the transform coefficients of the block based onthe determined scan order and based on size of the block.
 9. The methodof claim 1, further comprising: determining whether size of the block isa first size or a second size, wherein, if the size of the block is thefirst size, determining the contexts comprises determining the contextsthat are the same for all scan orders, and wherein, if the size of theblock is the second size, determining the contexts comprises determiningthe contexts that are different for at least two different scan orders.10. The method of claim 1, wherein the block comprises an 8×8 block oftransform coefficients.
 11. A method for encoding video data, the methodcomprising: determining a scan order for transform coefficients of ablock; determining contexts for significance flags of the transformcoefficients of the block based on the determined scan order; contextadaptive binary arithmetic coding (CABAC) encoding the significanceflags of the transform coefficients based at least on the determinedcontexts; and signaling the encoded significance flags in a codedbitstream.
 12. The method of claim 11, wherein determining the contextscomprises determining the contexts based on size of the block, positionsof the transform coefficients within the block, and the scan order. 13.The method of claim 11, wherein determining the contexts comprises:determining the contexts that are the same if the determined scan orderis a horizontal scan and if the determined scan order is a verticalscan; and determining the contexts, which are different than thecontexts if the determined scan order is the horizontal scan and if thedetermined scan order is the vertical scan, if the determined scan orderis not the horizontal scan or the vertical scan.
 14. The method of claim11, wherein determining contexts for the significance flags of thetransform coefficients of the block based on the determined scan ordercomprises determining the same contexts if the scan order is horizontalscan order or vertical scan order.
 15. The method of claim 11, whereindetermining the contexts comprises: determining a first set of contextsfor the significance flags if the scan order is a first scan order; anddetermining a second set of contexts for the significance flags if thescan order is a second scan order.
 16. The method of claim 15, whereinthe first set of contexts is the same as the second set of contexts ifthe first scan order is a horizontal scan and the second scan order is avertical scan.
 17. The method of claim 15, wherein the first set ofcontext is different than the second set of contexts if the first scanorder is one of a horizontal scan or a vertical scan and the second scanorder is not the horizontal scan or the vertical scan.
 18. The method ofclaim 11, wherein determining the contexts comprises determining thecontexts for the significance flags of the transform coefficients of theblock based on the determined scan order and based on size of the block.19. The method of claim 11, wherein the block comprises an 8×8 block oftransform coefficients.
 20. An apparatus for coding video data, theapparatus comprising a video coder configured to: determine a scan orderfor transform coefficients of a block; determine contexts forsignificance flags of the transform coefficients of the block based onthe determined scan order; and context adaptive binary arithmetic coding(CABAC) code the significance flags of the transform coefficients basedat least on the determined contexts.
 21. The apparatus of claim 20,wherein the video coder comprises a video decoder, and wherein the videodecoder is configured to: receive, from a coded bitstream, thesignificance flags of the transform coefficients of the block; and CABACdecode the significance flags of the transform coefficients based on thedetermined contexts.
 22. The apparatus of claim 20, wherein the videocoder comprises a video encoder, and wherein the video encoder isconfigured to: CABAC encode the significance flags of the transformcoefficients based on the determined contexts; and signal, in a codedbitstream, the significance flags of the transform coefficients.
 23. Theapparatus of claim 20, wherein, to determine the contexts, the videocoder is configured to determine the contexts based on size of theblock, positions of the transform coefficients within the block, and thescan order.
 24. The apparatus of claim 20, wherein, to determine thecontexts, the video coder is configured to: determine the contexts thatare the same if the determined scan order is a horizontal scan and ifthe determined scan order is a vertical scan; and determine thecontexts, which are different than the contexts if the determined scanorder is the horizontal scan and if the determined scan order is thevertical scan, if the determined scan order is not the horizontal scanor the vertical scan.
 25. The apparatus of claim 20, wherein, todetermine contexts for the significance flags of the transformcoefficients of the block based on the determined scan order, the videocoder is configured to determine the same contexts if the scan order ishorizontal scan order or vertical scan order.
 26. The apparatus of claim20, wherein, to determine the contexts, the video coder is configuredto: determine a first set of contexts for the significance flags if thescan order is a first scan order; and determine a second set of contextsfor the significance flags if the scan order is a second scan order. 27.The apparatus of claim 26, wherein the first set of contexts is the sameas the second set of contexts if the first scan order is a horizontalscan and the second scan order is a vertical scan.
 28. The apparatus ofclaim 26, wherein the first set of context is different than the secondset of contexts if the first scan order is one of a horizontal scan or avertical scan and the second scan order is not the horizontal scan orthe vertical scan.
 29. The apparatus of claim 20, wherein, to determinethe contexts, the video coder is configured to determine the contextsfor the significance flags of the transform coefficients of the blockbased on the determined scan order and based on size of the block. 30.The apparatus of claim 20, wherein the video coder is configured to:determine whether size of the block is a first size or a second size,wherein, if the size of the block is the first size, the video coder isconfigured to determine the contexts that are the same for all scanorders, and wherein, if the size of the block is the second size, thevideo coder is configured to determine the contexts that are differentfor at least two different scan orders.
 31. The apparatus of claim 20,wherein the block comprises an 8×8 block of transform coefficients. 32.The apparatus of claim 20, wherein the apparatus comprises one of: amicroprocessor; an integrated circuit (IC); and a wireless communicationdevice that includes the video coder.
 33. An apparatus for coding videodata, the apparatus comprising: means for determining a scan order fortransform coefficients of a block; means for determining contexts forsignificance flags of the transform coefficients of the block based onthe determined scan order; and means for context adaptive binaryarithmetic coding (CABAC) the significance flags of the transformcoefficients based at least on the determined contexts.
 34. Theapparatus of claim 33, wherein the means for determining the contextscomprises means for determining the contexts based on size of the block,positions of the transform coefficients within the block, and the scanorder.
 35. A computer-readable storage medium having instructions storedthereon that when executed cause one or more processors of an apparatusfor coding video data to: determine a scan order for transformcoefficients of a block; determine contexts for significance flags ofthe transform coefficients of the block based on the determined scanorder; and context adaptive binary arithmetic coding (CABAC) code thesignificance flags of the transform coefficients based at least on thedetermined contexts.
 36. The computer-readable storage medium of claim35, wherein the instructions that cause the one or more processors todetermine the contexts comprise instructions that cause the one or moreprocessors to determine the contexts based on size of the block,positions of the transform coefficients within the block, and the scanorder.